首页> 外文OA文献 >Design and Execution of make-like, distributed Analyses based on Spotify's Pipelining Package Luigi
【2h】

Design and Execution of make-like, distributed Analyses based on Spotify's Pipelining Package Luigi

机译:基于Web的仿制分布式分析的设计与实现   spotify的流水线包Luigi

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In high-energy particle physics, workflow management systems are primarilyused as tailored solutions in dedicated areas such as Monte Carlo production.However, physicists performing data analyses are usually required to steertheir individual workflows manually which is time-consuming and often leads toundocumented relations between particular workloads. We present a genericanalysis design pattern that copes with the sophisticated demands of end-to-endHEP analyses and provides a make-like execution system. It is based on theopen-source pipelining package Luigi which was developed at Spotify and enablesthe definition of arbitrary workloads, so-called Tasks, and the dependenciesbetween them in a lightweight and scalable structure. Further features aremulti-user support, automated dependency resolution and error handling, centralscheduling, and status visualization in the web. In addition to alreadybuilt-in features for remote jobs and file systems like Hadoop and HDFS, weadded support for WLCG infrastructure such as LSF and CREAM job submission, aswell as remote file access through the Grid File Access Library. Furthermore,we implemented automated resubmission functionality, software sandboxing, and acommand line interface with auto-completion for a convenient workingenvironment. For the implementation of a $t\bar{t}H$ cross section measurement,we created a generic Python interface that provides programmatic access to allexternal information such as datasets, physics processes, statistical models,and additional files and values. In summary, the setup enables the execution ofthe entire analysis in a parallelized and distributed fashion with a singlecommand.
机译:在高能粒子物理学中,工作流管理系统主要用作蒙特卡洛(Monte Carlo)生产等专用领域中的量身定制的解决方案,但是通常需要进行数据分析的物理学家手动操纵其各个工作流,这很耗时,并且经常导致特定对象之间的无证关系。工作量。我们提出了一种通用分析设计模式,该模式可以应对端到端HEP分析的复杂需求,并提供类似make的执行系统。它基于在Spotify开发的开源管道打包程序Luigi,并允许以轻量级和可伸缩的结构定义任意工作负载(所谓的任务)以及它们之间的依赖关系。其他功能还包括多用户支持,自动依赖性解决和错误处理,集中计划以及Web中的状态可视化。除了针对Hadoop和HDFS之类的远程作业和文件系统的内置功能之外,我们还添加了对WLCG基础架构的支持,例如LSF和CREAM作业提交,以及通过网格文件访问库的远程文件访问。此外,我们实施了自动重新提交功能,软件沙箱以及具有自动完成功能的命令行界面,以提供便利的工作环境。为了实现$ t \ bar {t} H $横截面测量,我们创建了一个通用的Python接口,该接口可通过编程方式访问所有外部信息,例如数据集,物理过程,统计模型以及其他文件和值。总而言之,该设置可通过单个命令以并行和分布式方式执行整个分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号